智能论文笔记

Linear programming word problems formulation using EnsembleCRF NER labeler and T5 text generator with data augmentations

JiangLong He , Mamatha N , Shiv Vignesh , Deepak Kumar , Akshay Uppal

分类：自然语言处理 | 人工智能

2022-12-30

We propose an ensemble approach to predict the labels in linear programming word problems. The entity identification and the meaning representation are two types of tasks to be solved in the NL4Opt competition. We propose the ensembleCRF method to identify the named entities for the first task. We found that single models didn't improve for the given task in our analysis. A set of prediction models predict the entities. The generated results are combined to form a consensus result in the ensembleCRF method. We present an ensemble text generator to produce the representation sentences for the second task. We thought of dividing the problem into multiple small tasks due to the overflow in the output. A single model generates different representations based on the prompt. All the generated text is combined to form an ensemble and produce a mathematical meaning of a linear programming problem.

translated by 谷歌翻译

Biomedical image analysis competitions: The state of current participation practice

Matthias Eisenmann , Annika Reinke , Vivienn Weru , Minu Dietlinde Tizabi , Fabian Isensee , Tim J. Adler , Patrick Godau , Veronika Cheplygina , Michal Kozubek , Sharib Ali

分类：计算机视觉 | 机器学习

2022-12-16

The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.

translated by 谷歌翻译

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Teven Le Scao , Angela Fan , Christopher Akiki , Ellie Pavlick , Suzana Ilić , Daniel Hesslow , Roman Castagné , Alexandra Sasha Luccioni , François Yvon , Matthias Gallé

分类：自然语言处理

2022-11-09

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.

translated by 谷歌翻译

Dict-NMT: Bilingual Dictionary based NMT for Extremely Low Resource Languages

Nalin Kumar , Deepak Kumar , Subhankar Mishra

分类：自然语言处理 | 人工智能

2022-06-09

神经机器翻译（NMT）模型在大型双语数据集上已有效。但是，现有的方法和技术表明，该模型的性能高度取决于培训数据中的示例数量。对于许多语言而言，拥有如此数量的语料库是一个牵强的梦想。我们从单语言词典探索新语言的单语扬声器中汲取灵感，我们研究了双语词典对具有极低或双语语料库的语言的适用性。在本文中，我们使用具有NMT模型的双语词典探索方法，以改善资源极低的资源语言的翻译。我们将此工作扩展到多语言系统，表现出零拍的属性。我们详细介绍了字典质量，培训数据集大小，语言家族等对翻译质量的影响。多种低资源测试语言的结果表明，我们的双语词典方法比基线相比。

translated by 谷歌翻译

Adapting Rapid Motor Adaptation for Bipedal Robots

Ashish Kumar , Zhongyu Li , Jun Zeng , Deepak Pathak , Koushil Sreenath , Jitendra Malik

分类：机器人 | 人工智能 | 计算机视觉 | 机器学习

2022-05-30

腿部运动的最新进展使四足动物在具有挑战性的地形上行走。但是，两足机器人本质上更加不稳定，因此很难为其设计步行控制器。在这项工作中，我们利用了对机车控制的快速适应的最新进展，并将其扩展到双皮亚机器人。与现有作品类似，我们从基本策略开始，该策略在将适应模块的输入中作为输入作为输入。该外部媒介包含有关环境的信息，并使步行控制器能够快速在线适应。但是，外部估计器可能是不完善的，这可能导致基本政策的性能不佳，这预计是一个完美的估计器。在本文中，我们提出了A-RMA（Adapting RMA），该A-RMA（适应RMA）还通过使用无模型RL对其进行了鉴定，从而适应了不完美的外部外部估计器的基本策略。我们证明，A-RMA在仿真中胜过许多基于RL的基线控制器和基于模型的控制器，并显示了单个A-RMA策略的零拍摄部署，以使双皮德机器人Cassie能够在各种各样的现实世界中的不同场景超出了培训期间所见。 https://ashish-kmr.github.io/a-rma/的视频和结果

translated by 谷歌翻译

Coupling Vision and Proprioception for Navigation of Legged Robots

Zipeng Fu , Ashish Kumar , Ananye Agarwal , Haozhi Qi , Jitendra Malik , Deepak Pathak

分类：机器人 | 人工智能 | 计算机视觉 | 机器学习

2021-12-03

我们利用了肢体机器人互动和预言的互补优势，实现了点球导航。腿系统能够穿过比轮式机器人更复杂的地形，而是为了充分利用这种能力，我们需要导航系统中的高级路径规划仪，了解在不同地形上的低级运动策略的步行能力。我们通过使用壁虎搜寻反馈来实现这一目标来估计行走政策的安全操作限制，并感知意外障碍和地形性质，如可能被视力错过的地面的平滑度或柔软度。导航系统使用车载相机来生成占用映射和相应的成本图以实现目标。然后，FMM（快速行进方法）规划器然后生成目标路径。速度命令生成器将此作为输入，以从安全顾问，意外障碍和地形速度限制生成作为输入附加约束的机车策略的所需速度。与轮式机器人（Logobot）基线（Logobot）基线和其他具有不相交的基调规划和低级控制的基线显示出卓越的性能。我们还在具有板载传感器和计算的Quadruped Robot上显示了我们系统的真实部署。 https://navigation-locomotion.github.io/camera-ready的视频

translated by 谷歌翻译

On the Opportunities and Risks of Foundation Models

Rishi Bommasani , Drew A. Hudson , Ehsan Adeli , Russ Altman , Simran Arora , Sydney von Arx , Michael S. Bernstein , Jeannette Bohg , Antoine Bosselut , Emma Brunskill

分类：机器学习 | 人工智能

2021-08-16

AI正在经历范式转变，随着模型的兴起（例如Bert，Dall-E，GPT-3），这些模型经过大规模的数据训练，并且可以适应广泛的下游任务。我们称这些模型基础模型来强调其至关重要但不完整的特征。该报告提供了基础模型的机会和风险的详尽说明，包括其功能（例如语言，愿景，机器人技术，推理，人类互动）和技术原则（例如，模型架构，培训程序，数据，系统，安全，安全性，评估，理论）对其应用（例如法律，医疗保健，教育）和社会影响（例如不平等，滥用，经济和环境影响，法律和道德考虑）。尽管基础模型基于标准的深度学习和转移学习，但它们的规模导致了新的新兴能力，以及它们在许多任务中的有效性都激发了同质化。同质化提供了强大的杠杆作用，但要求谨慎，因为基础模型的缺陷均由下游的所有适应模型继承。尽管即将广泛地部署基础模型，但我们目前对它们的工作方式，失败以及由于其新兴属性的影响而缺乏清晰的了解。为了解决这些问题，我们认为基础模型的许多批判性研究都需要与他们的基本社会技术性质相称。

translated by 谷歌翻译

Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic Languages

Gowtham Ramesh , Sumanth Doddapaneni , Aravinth Bheemaraj , Mayank Jobanputra , Raghavan AK , Ajitesh Sharma , Sujit Sahoo , Harshita Diddee , Mahalakshmi J , Divyanshu Kakwani

分类：自然语言处理

2021-04-12

我们介绍Samanantar，是最大的公开可用的并行Corpora Collection，用于指示语言。该集合中的英语和11个上线语言之间总共包含4970万句对（来自两种语言系列）。具体而言，我们从现有的公共可用并行基层编译1240万句对，另外，从网络上挖掘3740万句对，导致4倍增加。我们通过组合许多语料库，工具和方法来挖掘网站的并行句子：（a）Web爬行单格式语料库，（b）文档OCR，用于从扫描的文档中提取句子，（c）用于对齐句子的多语言表示模型，以及（d）近似最近的邻居搜索搜索大量句子。人类评估新矿业的Corpora的样本验证了11种语言的高质量平行句子。此外，我们使用英语作为枢轴语言，从英式并行语料库中提取所有55个指示语言对之间的834百万句子对。我们培训了跨越Samanantar上所有这些语言的多语种NMT模型，这在公开可用的基准上表现出现有的模型和基准，例如弗洛雷斯，建立萨曼塔尔的效用。我们的数据和模型可在Https://indicnlp.ai4bharat.org/samanantar/上公开提供，我们希望他们能够帮助推进NMT和Multibingual NLP的研究。

translated by 谷歌翻译

e-Inu: Simulating A Quadruped Robot With Emotional Sentience

Abhiruph Chakravarty , Jatin Karthik Tripathy , Sibi Chakkaravarthy S , Aswani Kumar Cherukuri , S. Anitha , Firuz Kamalov , Annapurna Jonnalagadda

分类：机器人 | 机器学习

2023-01-03

Quadruped robots are currently used in industrial robotics as mechanical aid to automate several routine tasks. However, presently, the usage of such a robot in a domestic setting is still very much a part of the research. This paper discusses the understanding and virtual simulation of such a robot capable of detecting and understanding human emotions, generating its gait, and responding via sounds and expression on a screen. To this end, we use a combination of reinforcement learning and software engineering concepts to simulate a quadruped robot that can understand emotions, navigate through various terrains and detect sound sources, and respond to emotions using audio-visual feedback. This paper aims to establish the framework of simulating a quadruped robot that is emotionally intelligent and can primarily respond to audio-visual stimuli using motor or audio response. The emotion detection from the speech was not as performant as ERANNs or Zeta Policy learning, still managing an accuracy of 63.5%. The video emotion detection system produced results that are almost at par with the state of the art, with an accuracy of 99.66%. Due to its "on-policy" learning process, the PPO algorithm was extremely rapid to learn, allowing the simulated dog to demonstrate a remarkably seamless gait across the different cadences and variations. This enabled the quadruped robot to respond to generated stimuli, allowing us to conclude that it functions as predicted and satisfies the aim of this work.

translated by 谷歌翻译

NaQ: Leveraging Narrations as Queries to Supervise Episodic Memory

Santhosh Kumar Ramakrishnan , Ziad Al-Halah , Kristen Grauman

分类：计算机视觉

2023-01-02

Searching long egocentric videos with natural language queries (NLQ) has compelling applications in augmented reality and robotics, where a fluid index into everything that a person (agent) has seen before could augment human memory and surface relevant information on demand. However, the structured nature of the learning problem (free-form text query inputs, localized video temporal window outputs) and its needle-in-a-haystack nature makes it both technically challenging and expensive to supervise. We introduce Narrations-as-Queries (NaQ), a data augmentation strategy that transforms standard video-text narrations into training data for a video query localization model. Validating our idea on the Ego4D benchmark, we find it has tremendous impact in practice. NaQ improves multiple top models by substantial margins (even doubling their accuracy), and yields the very best results to date on the Ego4D NLQ challenge, soundly outperforming all challenge winners in the CVPR and ECCV 2022 competitions and topping the current public leaderboard. Beyond achieving the state-of-the-art for NLQ, we also demonstrate unique properties of our approach such as gains on long-tail object queries, and the ability to perform zero-shot and few-shot NLQ.

translated by 谷歌翻译